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(57) ABSTRACT 

A method of converting text to speech in a communication 
device includes providing a code table containing coded 
speech parameters. Next steps include inputting a text mes- 
sage into a communication device, and dividing the text 
message into phonics. A next step includes mapping each of 
the phonics against the code table to find the coded speech 
parameters corresponding lo each of the phonics. A next step 
includes processing the coded speech parameters corre- 
sponding to each of the phonics to provide an audio signal. 
In this way, text can be mapped directly to a vocoder table 
without intermediate translation steps. 

8 Claims, 3 Drawing Sheets 
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TEXT-TO-SPEECH NATIVE CODING IN A 
COMMUNICATION SYSTEM 

FIELD OF THE INVENTION 

The present invention relates generally to text-to-speech 
synthesis, and more particularly to text-to-speech synthesis 
in a communication system using native speech coding. 

BACKGROUND OF THE INVENTION 

Radio communication devices, such as cellular phones, 
are no longer viewed as voice only devices. With the advent 
of data based wireless services available to consumers, some 
serious problems arise for the conventional cellular phones. 
For example, cellular phones are currently only capable of 
presenting data services in text format on a small screen. 
This requires screen scrolling or other user manipulation in 
order to get the data or message. Also, comparing to landline 
systems, a wireless system has much higher data error rate 
and faces spectrum constraints, which makes providing 
real-time streaming audio, i.e. real-audio, to cellular users 
impractical. One way to deal with these problems is text- 
to-speech encoding. 

The process of converting text to speech is generally 
broken down into two major blocks: text analysis and speech 
synthesis. Text analysis is the process by which text is 
converted into a linguistic description that can be synthe- 
sized. This linguistic description generally consists of the 
pronunciation of the speech to be synthesized along with 
other properties that determine the prosody of the speech. 
These other properties can include (1) syllable, word, 
phrase, and clause boundaries; (2) syllable stress; (3) part- 
of-speech information; and (4) explicit representations of 
prosody such as are provided by the ToBI labeling system, 
as known in the art, and further described in 2nd Interna- 
tional Conference on Spoken Language Processing 
(ICSLP92): TOBI: "A Standard for Labeling English 
Prosody", Silverman et al, (October 1992). 

The pronunciation of speech included in the linguistic 
description is described as a sequence of phonetic units. 
These phonetic units are generally phones or phonics, which 
are particular physical speech sounds, or allophones, which 
are particular ways in which a phoneme may be expressed. 
(A phoneme is a speech sound perceived by the speakers of 
a language). For example, the English phoneme "t" may be 
expressed as a closure followed by a burst, as a glottal stop, 
or as a flap. Each of these represents different allophones of 
"t". Different sounds that may be produced when "t"* is 
expressed as a flap represent different phonics. Other pho- 
netic units that are sometimes used are demisyllables and 
diphones. Demisyllables are half-syllables and diphones are 
sequences of two phonics. 

Speech synthesis can be generated from phonics using a 
rule-based system. For example, the phonetic unit has a 
target phenome acoustic parameters (such as duration and 
intonation) for each segment type, and has rules for smooth- 
ing the parameter transitions between the segments, la a 
typical concatenation system, the phonetic component has a 
parametric representation of a segment occurring in natural 
speech and concatenates these recorded segments, smooth- 
ing the boundaries between segments using predefined rules. 
The speech is then processed through a vocoder for trans- 
mission. \tiice coders, such as vector-sum or code excited 
hnear prediction (CELP) vocoders are in general use in 
digital ceUular commimication devices. For example, U.S. 
Pat, No. 4,817,157, which is hereby incorporated by 
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reference, describes such a vocoder implementation as used 
for the Global System for Mobile (GSM) communication 
system among others. 

Unfortunately, the text-to-speech process as described 

s above is computationally complex and extensive. For 
example, in existing digital communication devices, 
vocoder technology already uses the limits of computational 
power in a device in order to maintain voice quality at its 
highest possible level. However, the tcxt-to-speech process 

10 described above requires further signal processing in addi- 
tion to the vocoder processing. In other words, the process 
of converting text to phonics, applying acoxistic parameters 
rules for each phonic, concatenation to provide a voiced 
signal, and voice coding require more processing power than 

15 just voice coding alone. 

Accordingly, there is a need for an improved text-to- 
speech coding system that reduces the amount of signal 
processing required to provide a voiced output. In particular, 
it would be of benefit to be' able to use the existing native 

^ speech coding incorporated into a communication device. It 
would also be advantageous if current low-cost technology 
could be used without the requirement for customized 
hardware. 

25 SUMMARY OF THE INVENTION 

The present invention finds use in communication 
devices, such as radiotelephones for example, that have 
audio capabilities that can take advantage of text-to-speech 
conversion of text messages. 

One aspect of the present invention uses an existing 
vocoder with a stored code table containing coded speech 
parameters for use in text-to-speech conversion. These 
native speech parameters in a communication device can be 
used without the need to create and store new speech 
parameters. Instead, the native parameters can be modified 
if and when needed, such as to provide more natural- 
sounding language for example. 

Another aspect of the present invention involves dividing 
^ the text messages into phonics, spaces, and special 
characters, and wherein white noise is used to emulate 
spaces between words of text. This saves time and code 
processing for non-phonics that do not contain any speech 
information. 

45 Another aspect of the present invention involves the 
division of text into phonics which can be mapped against 
native coded speech parameters used in existing communi- 
cation systems. For example, each distinct phonic can be 
mapped with a memory location index of predefined phonics 

50 in a look-up table to point to a digitized wave file defining 
equivalent native coded speech parameters fi-om the code 
table. 

BRIEF DESCRIPTION OF THE DRAWINGS 

55 FIG. 1 shows a flow chart of a text-to-speech system, in 
accordance with the present invention; 

FIG. 2 shows a simphfied block diagram of a text-to- 
speech system, in accordance with the present invention; 
and 

60 FIG. 3 shows a flow chart of a preferred embodiment of 
a text-to-speech system, in accordance with the present 
invention. 

DETAILED DESCRIPTION OF THE 
^5 PREFERRED EMBODIMENTS 

The present invention provides an improved text-to- 
speech system that reduces the amount of signal processing 
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required to provide a voiced output by taking advantage of 
the digital signal processor (DSP) and sophisticated speech 
coding algorithms that already exist in ccUular phones. lo 
particular, the present invention provides a system that 
converts an incoming text message into a voice output using 
the native cellular speech coding and existing hardware of a 
communication device, without a increase in memory 
requirements or processing power. 

Advantageously, the present invention utilizes the exiting 
data interface between the microprocessor and DSP in a 
cellular radiotelephone along with existing software capa- 
bilities. In addition, the present invention can be used in 
conjunction with any text based data services, such as Short 
Messagiug Service (SMS) as used in the Global System for 
Mobile (GSM) communication system, for example. Con- 
ventional cellular handsets have the following functional- 
ities in place: (a) an air-to-air interface to retrieve test 
messages from remote service providers, (b) software to 
convert received binary data into appropriate text format, (c) 
audio server software to play audio to output devices, such 
as speakers or earphones for example, (d) highly efiSdent 
audio compression coding system to ge aerate human voice 
through digital signal processing, and (e) a hardware inter- 
face between a microprocessor and a DSP, When receiving 
a text-based data message, a conventiooal ceUular handset 
win convert the signal to text format (ASQI or Unicode), as 
is knowa in the art. The present invention converts this 
formatted text string to speech. Alternatively, a network 
server of the communication system can converts this for- 
matted text string to speech and transmit this speech to a 
conventional cellular handset over a voice channel instead of 
a data channel 

FIGS. 1 and 2 show a method and system for converting 
text-to-speech in accordance with the present invention. In 
a preferred embodiment, the text will be converted to coded 
speech parameters native to the communication system, 
saving the processing steps of converting text-to-voice and 
then running the voice signal through a vocoder. In the 
method of the present invention, a first step 102 includes 
providing a code table 202 containing coded speech param- 
eters. Such code tables are known in the art and typically 
include Code Excitation Linear Predictors (CELP) and Vec- 
tor Sum Excited Linear Predictors (VSELP) among others. 
The code table 202 is stored in a memory. In effect, a code 
table contains compressed audio data representing critical 
speech parameters. As a result, the digital transfer of audio 
information can encoded and decoded using these code 
tables to reduce bandwidth providing more efficiency with- 
out a noticeable loss in voice quality. A next step 104 in the 
process is inputting a text message. Preferably, the text 
message is formatted in an existing format that can be read 
by the communication system without requiring hardware or 
software changes. 

A next step 106 includes dividing the text message into 
phonics by an audio server 204. The audio server 204 is 
realized in the microprocessor or DSP of the cellular 
handsel, or can be done in the network server. In particular, 
the text message is processed in an audio server 204 that is 
software based on a rule table for a particular language 
tailored to recognize the structure and phenomes of that 
language. The audio server 204 breaks the sentences of the 
text into words by recognizing spaces and punctuation, and 
further divides the words into phonics. Of course, a data 
message may contain other characters besides letters or may 
contain abbreviations, contractions, and other deviations 
from normal text. Therefore, before breaking a text message 
into sentences, these other characters or symbols, e.g. "$", 
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munbers and common abbreviations, will be translated into 
their corresponding words by the audio server. To emulate 
the pause between each word in human speech, white noise 
is inserted between each word. For example, a 15 ms period 

5 of white noise has been found adequate to separate words. 
Optionally, the text can contain special characters. The 
special characters include modifying information for the 
coded speech parameters, wherein after mapping the modi- 
fying information is applied to the coded speech parameters 

10 in order to provide more natural-sounding ^eech signal. For 
example, a special character (such as an ASQI symbol for 
example) can be used to indicate the accent or inflection of 
a word. For instance, the word "manual" can be represented 
"manual^' in text. The audio server software can then tune 
the phonetic to make the speech closer to a naturally 
inflected voice. This option requires the text messaging 
service or audio server to provide such special characters. 

After linguistic analysis, a next step 108 iocludes map- 
ping each of the phonics from the audio server, by a mapping 
unit 206, against the code table 202 to find the coded speech 
parameters corresponding to each of the phonics. In 
particular, each phonic is mapped into a corresponding 
digitized voice waveform that is compressed in the format 
that's native to a particular cellular system. For instance, in 
the GSM communication system, the native format can be 
the half rate vocoder format, as is known in the art. More 
particularly, each phonic has a predetermined digitized 
waveform, in the communication system native format, 
pre-stored in the memory. The audio server 204 determines 
a phonic, and the mapping unit 206 matches each distinct 
phonic with a memory location index of predefined phonics 
in a look-up table 212 to point to a digitized wave file 
defining the equivalent native coded speech parameters from 
the code table 202. Preferably, the look-up table 212 is used 
to map individual phonics into the memory location of the 
compressed and digitized audio in the existing code table of 
the vocoder of the cellular phone. For the English language, 
the look-up table size is slightly less than one megabyte with 
the GSM voice compression algorithm. 

For example, there are about 4119 possible phonic com- 
binations in English or a similar language. On average, the 
speed of the speech is about 200 words/min (about 500 
phonics per minute and 6,7 phonics per second), thus each 

45 phonic lasts 0.15 s. With an 8 kHz sample rate and a 16-bit 
resolution, there are about 2400 bytes/phonic (0.15 sx8 
kHzx2 bytes). With the 10:1 vocoder compression used in 
the GSM, the compressed digitized voice will be around 240 
bytes/phonic. Thus, with about 4119 phonics the total size of 
the look-up table is about 989 kbytes for each language. 

The mapping unit (which can also be the audio server) can 
then assemble the digitized representations of the phonics, 
along with white noise for spaces between words, into a 
string of data using the knowledge of the word and sentence 

55 structure learned from breaking the text into phonics. 

In a next step 110, the native coded speech parameters, 
corresponding to each of the phonics from the previous step 
and along with suitable spaces, are subsequently processed 
in a signal processor 208 (such as a DSP for example) to 

60 provide a decompressed speech signal to an audio circuit 
210 of the cellxilar phone handset, which includes an audio 
transducer. Inasmuch as the phonics are already coded in 
native parameters, the DSP needs no modification to prop- 
erly provide a speech signal. To take advantage of the 

65 existing DSP capability, the coding system used for speech 
synthesis should be native to a particular cellular phone 
standard, since the DSP and its software are designed to 
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decompress that parliciilar codiag format in an existing ending address, go back to the reading step 304 and finish 

vocoder. For instance, in GSM-based handsets, digitized the reading to the end of the current flash memory sector. In 

audio should be stored in the full-rate vocoder coding this way, the requirement of the random memory can be 

format, and can be stored in half-rate vocoder coding format. united to the size of 200 bytes. Thus, no additional random 

If the mterfacc between a DSP and a microprocessor is s memory is required for even the simplest cellular phone 

shared memory, the audio file can be direcQy placed into the handsets 

shared memory. Once the sentence is assembled, an interrupt r *u u i l • j- j ^-i 

„r;ii ^^„^^La -1 u TM^n u- u • *- -u In the above example, phonics-digitized audio files are 

will be generated to tngger a read by DSP, which m turn will .t„„j :„ . a u w t. • -ui 

decompress and play the audio. If the interface is a serial or ^ I ^^"^ Jf*^""'^' "^^'f 

parallel bus, the compressed audio will be stored in a RAM ^L^'A'i.^f kIT''"'' ^ '° ?' n°* 

buffer until sentence is complete. After that, the micropro- " f". "^f 

cesser wiU transfer the data to DSP for decompression and ^,h?„ fil.T/T '^',f®'=''°'=5' '° ^^"^"^ 

t *^ audio files stored on the same memory sector once it is 

f . 1 ^. . , , J r L loaded into the RAM. Instead of loading one memory page 

Preferably, the above steps are repeated for each sentence fo, ^onic then loading another page for next phonic, an 

m the mputted te«. However ,t can be repeated for ea^^ 15 i^^termediate array can be assembl,^ that contains the 

phonic or up to the length of the available memory. For memory locations of all phonics in a sentence. Table 1 shows 

example, a Paragraph, page or enure text can be mputted , i, phonic-to-memory location look-up table, 
before being divided mto phonics. In one embodiment, a 

transmitting step is included after the mapping step 108. TABLE 1 
This transmitting step includes transmitting the coded 20 
speech parameters from a network server to a wireless 
communication device, and wherein the processing step is 
performed in the wireless communication device and all the 
previous steps 102-108 are performed in the network server. 
However, in a preferred embodiment, all the steps 102-110 25 
are performed within a wireless communication device. The 
text message itself can be provided by a network server or 
another communication device. 

Unlike desktop and laptop computers, a cellular radio- Consider a sentence, "AB C\ with a space between B and 

telephone is a hand held device very sensitive to size, weight 30 C. In a direct method, page 3 will be loaded into RAM, then 

and cost. Thus, the hardware to realize the text-to-speecb copy 200 bytes starting at location 210 to a memory buffer, 

conversion of the present invention should use minimal Page 4 is then loaded, copy 180 bytes into a buffer starting 

number of parts and at low cost. The look-up table of the at location 1500. Then copy a digitized white noise segment 

phonics should be stored in flash memory for its non- into the buffer, after that load page 3 again, copy 150 Bytes 

volatility and high density. Because the flash memory cannot 35 starting at location 1000 into the buffer. The text string is 

be addressed randomly, the digital data of the phonics need then converted to audio. An indirect method can also be 

to be loaded into the random memory before being sent to used. The different between the direct and indirect method is 

the DSP. The simplest way is to map the whole look-up table that in direct method the software will not look ahead, 

into the random memory, but this requires at least one Therefore, in the above example, (AB C), software will load 

megabyte of memory for a very simple look-up tabic. 40 page 3, locate and copy A, then load page 4 and locate and 

Another option is to load one sector from flash memory into copy B, then reload page 3 and locate and copy C, while in 

the random memory at a time, but it this still requires 64 the indirect method, software will load page 3 and copy both 

kbytes of extra random memory. A and C into a pre-allocated memory buffer, than load page 

For the purpose of minimizing the requirement of the ^ copV B iiito the buffer. In this way, only a two page 
memory, the following approach can be used, referring to 45 load is required which saves time and processor power. 
FIG. 3: laying out 300 an intermediate array in random With an intermediate mapping method, "AD C" is trans- 
memory as a look-up table, (a) find 301 the starting and the la ted to a memory location array, {3:210:200, 4:1500:180, 
ending addresses of the phonics in the look-up table, (b) save 3:1000:150}. A memory buffer to store digitized audio is 
302 the starting and the ending addresses in the micropro- created based upon the total size required, in this case the 
cessor registers, (c) use 303 one microprocessor register as 50 sum of three phonics (200+1804+150) plus a white noise 
a counter, with the counter being set to zero before reading segment for the space. After loading page 3 into memory, the 
the look-up table from the flash memory, adding one count memory location array is searched to locate all the audio 
to the counter for each read cycle, (d) read 304 one single files that are stored on this page, in this case A and C, which 
byte or word of the look-up table from the flash memory in are then copied to their respected locations in the memory 
a non-synchronized mode or in a synchronized mode at a 55 buffer. With this method, we can significantly cut down the 
low clock frequency, so that the microprocessor can have memory access time and improve the ef&ciency. 
enough time to perform necessary operation between the In practice, the present invention uses existing text based 
read cycles, and (e) use the microprocessor register to store messaging services in a commimication system. SMS (Short 
305 the one byte/word of data in the intermediate array, message service) is a popular text based message service for 
comparing 306 the counter value with starting address. If the 60 GSM system. Under certain situations, i.e. driving or it 
counter value is less than the starting address, go back to the being too dark to read, converting a text message into speech 
reading step 304 and read die next byteAvord firom the flash is very desirable. In additioa, all current menu, phone book 
memory. If the counter value is equal or greater than the and operational prompts are in text format in current cellular 
starting address, compare 307 the counter value with the handsets. It is not possible for the visually impaired to 
ending address. If the coimter value is less than the ending 65 navigate through these visual prompts. The text-to-speech 
address, move the data from the microprocessor register into (TTS) system as described above solves this problem, 
the random memory. If the counter value is greater than the Instead of sending data in bandwidth intensive voice format 
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(although this can also be used), the present invention allows 
the use of the many communication services having a low 
data rate text format, such as SMS for example. This can be 
xiscd to advantage in real time driving directions, audio 
news, weather, location services, real time sports or breaking 
newscasts in text. TTS technology also opens a door for 
voice game application in cellular phones at very low cost. 

Moreover, TTS can use much lower bandwidth with text 
based messaging. It will not load the network and worsen the 
capacity strain on existing or future cellular networks. 
Further, the present invention allows incumbent network 
operators to offer a wide range of value-added services with 
the text messaging capabilities that already existed in their 
networks, instead of having to purchase licenses for new 
bandwidth and investing in new equipment This also 
applies to third party service providers that, under today's 
and proposed technologies, face even higher obstacles than 
network operators in providing any kind of data services to 
cellular phone users. Since TTS can be used with any 
standard text messaging services, anyone with the access to 
text-messaging gateways can provide a variety of services to 
millions of cellular phone users. With the technology and 
equipment barrier removed, many new business opportuni- 
ties will be opened up to the independent third party appli- 
cation providers. 

Like existing mobile web applications, the mobile TTS 
application also requires network server support. The server 
should be optimized based on the data traffic aad the cost per 
user. The major daily cost of the local server is the data 
traffic. Low data trafGc reduces the server return on invest- 
ment and the daily cost. The present invention can increase 
low data traffic and moderate data traffic since text does not 
need to be sent "on demand" when data traffic bandwidth 
may be unavailable, but can wait for period of lower, 
available data traffic. 

Although the invention has been described and illustrated 
in the above description and drawings, it is understood that 
this description is by way of example only and that numer- 
ous changes and modifications can be made by those stalled 
in the art without departing from the broad scope of the 
invention. Although the present invention finds particular 
use in portable cellular radiotelephones, the invention could 
be applied to any communication device, including pagers, 
electronic organizers, and computers. The present invention 
should be limited only by the following claims. 

What is claimed is: 

1. A method of converting text to speech in a communi- 
cation device operable to receive text messages and having 
a vocoder with a stored code table containing coded speech 
parameters, the method comprising the steps of: 

inputting a text message into the communication device; 
dividing the text message into phonics; 
mapping each of the phonics against the existing vocoder 
code table to find the coded speech parameters corre- 
sponding to each of the phonics by matching each 
distinct phonic with a memory location index of pre- 
defined phonics in a look-up table to point to a digitized 
wave file defining equivalent native coded speech 
parameters from the code table comprising the substeps 
of: 

providing an intermediate array in random memory as 
a look-up table; 

finding the starting and the ending addresses of the 
phonics in the look-up table; 

savmg the starting and the ending addresses in micro- 
processor registers; 
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using one microprocessor register as a counter, with the 
counter being set to zero before reading the look-up 
table, and adding one count to the counter for each 
read cycle; 

5 reading one single byte/word from the look-up table 
from flash memory; 
storing the one byteAvord of data in a microprocessor 
register, and 

comparing the counter value with starting address, 
wherein 

if the counter value is less than the starting address, 
go back to the reading step to read the next 
byte/word from memory, and 
if the counter value is equal or greater than the 
starting address, comparing the counter value with 
the ending address, wherein 
if the counter value is less than the ending 
address, moving the data fi-om the said step of 
storing the one byteAvord of data in a micro - 
processor register into the intermediate array 
random memory, and 
if the counter value is greater than the ending 
address, go back to the reading step and finish 
the reading to the end of the memory; and 
25 subsequently processing the coded speech parameters 
corresponding to each of the phonics fi-om the previous 
step in the vocoder of the communication device to 
provide a speech signal from the communication 
device. 

30 2. The method of claim 1, wherein the dividing step 
includes dividing the text messages into phonics, spaces, and 
special characters, and wherein spaces are emulated with 
white noise. 

3. The method of claim 2, wherein the special characters 
35 of the dividing step include modification information for the 

coded speech parameters, and wherein after the mapping 
step further comprising a step of 

applying the modification information to the coded speech 
parameters in order to provide more natural-sounding 
40 speech signal from the processing step. 

4. The method of claim 1, wherein in the providing step 
the code table includes one of code excited linear prediction 
parameters or vector sum excited linear prediction param- 
eters. 

45 5. A communication device for converting text-to-speech, 
the device operable to receive text messages and having a 
vocoder with a stored code table containing coded speech 
parameters, the communication device comprising: 
an audio server that converts input text into phonics; 
50 a mapping unit that maps each of the phonics against the 
existing vocoder code table to find the coded speech 
parameters corresponding to each of the phonics by 
matching each distinct phonic with a memory location 
index of predefined phonics in a look-up table to point 
55 to a digitized wave file defining equivalent native coded 
speech parameters from the code table by having the 
mapping unit: 

provide an intermediate array in random memory as a 
look-up table; 

60 find the starting and the ending addresses of the phonics 
in the look-up table; 
save the starting and the ending addresses in micro- 
processor registers; 
use one microprocessor register as a counter, with the 
65 counter being set to zero before reading the look-up 

table, and adding one count to the counter for each 
read cycle; 
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read one single byte/word from the look-up table from 

flash memory; 
store the one byte/word of data in a microprocessor 

register, and 

compare the counter value with starting address, s 
wherein 

if the counter value is less than the starling address, 

read the next byte/word from memory, and 
if the counter value is equal or greater than the 
starting address, compare the counter value with lo 
the ending address, wherein 
if the counter value is less than the ending 
address, move the data that was stored as one 
byte/word of data in the microprocessor regis- 
ter from the microprocessor register into the 15 
intermediate array random memory, and 
if the counter value is greater than the ending 
address, go back to reading and finish reading 
to the end of the memory; and 
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a signal processor incorporated in the vocoder of the 
communication device that processes the coded speech 
parameters corresponding to each of the phonics to 
provide a speech signal from the communication 
device. 

6. The communication device of claim 5, wherein the 
audio server converts the input text into phonics, spaces and 
special characters, and wherein spaces arc emulated with 
white noise. 

7. The communication device of claim 5, wherein the 
audio server converts the input text into phonics, spaces and 
special characters that include modification information for 
the coded speech parameters, and applies the modification 
information to the coded ^eech parameters in order to 
provide a more natural-sounding speech signal, 

8. The communication device of claim 5, wherein the 
code table is an existing code table used in a vocoder of the 
communication system. 

* ♦ * * * 
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